{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XpQMDcKy-fa0"
      },
      "source": [
        "# Welcome to BLING & DRAGON Model Zoo Sampler by LLMWare.ai 🦾\n",
        "This is the 2nd tutorial of our Agents Fast Start Series. So please fasten your seat belts and get ready to dive in Bling and Dragon models.\n",
        "\n",
        "\n",
        "## Get started with local inferences in minutes...\n",
        "\n",
        "This example is a \"hello world\" model zoo sampler for LLMWare BLING and DRAGON models, and shows a simple epeatable recipe for prompting models locally using a provided set of sample context passages and questions.\n",
        "\n",
        "It is designed to be easy to select among different LLMWARE models in the BLING (Best Little Instruct No-GPU) and DRAGON (Delivering RAG On ...) model families."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kWJY0nLC-fzM"
      },
      "source": [
        "## But what are Bling and Dragon models??\n",
        "Both Bling and Dragon models have been fine-tuned on complex business, financial and legal materials, with specific focus on accurate fact-based question-answering for a context passage.\n",
        "\n",
        "**BLING models** vary between 0.5B - 3.8B parameters and have been specifically designed to run on a CPU, including on local laptops.\n",
        "\n",
        "**DRAGON models** vary between 6-9B parameters, and when quantized, can generally run OK on most CPU-based laptops, but are designed for optimal use on a GPU or inference server.   These models operate on the same principles as the BLING models, making it easy to 'test' with BLING, and then 'upgrade' to DRAGON in shifting into production environment for greater accuracy.\n",
        "\n",
        "### Key training objectives:\n",
        "- **Fact Reliance**: Answers are grounded in the provided context; without it, the model often returns \"Not Found,\" reducing hallucinations and making it ideal for RAG and workflow tasks.\n",
        "\n",
        "- **Concise Answers**: Responses are short and factual—optimized for speed, integration, and automation rather than chat-like dialogue.\n",
        "\n",
        "- **Negative Sampling**: Trained to respond with \"Not Found\" when the context doesn’t contain the answer, aiding in multi-passage retrieval and filtering.\n",
        "\n",
        "- **Instruction-Free**: No system prompts or role instructions needed—just context and a question."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yU-iCADREXj4"
      },
      "outputs": [],
      "source": [
        "# install dependencies\n",
        "!pip3 install llmware # if you're getting error, then try upgrading the pip by runnnig this command : python -m pip install --upgrade pip"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MbFpA7pWEZok"
      },
      "source": [
        "If you have any dependency install issues, please review the README, docs link, or raise an Issue.\n",
        "\n",
        "Usually, if there is a missing dependency, the code will give the warning - and a clear direction like `pip install transformers'` required for this example, etc.\n",
        "\n",
        "As an alternative to pip install ... if you prefer, you can also clone the repo from github which provides a benefit of having access to 100+ examples.\n",
        "\n",
        "To clone the repo:\n",
        "```\n",
        "git clone \"https://www.github.com/llmware-ai/llmware.git\"\n",
        "sh \"welcome_to_llmware.sh\"\n",
        "```\n",
        "\n",
        "The second script `\"welcome_to_llmware.sh\"` will install all of the dependencies.\n",
        "\n",
        "If using Windows, then use the `\"welcome_to_llmware_windows.sh\"` script."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "otqo-DNMItUc"
      },
      "outputs": [],
      "source": [
        "# Now's let jump into the code part!\n",
        "import time # To see how much time a model takes to load\n",
        "from llmware.prompts import Prompt # A class (or function) inside the prompts module that helps you build or manage prompts for LLMs."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DpwHOs_ILM5d"
      },
      "source": [
        "Now below is a function that contains a few test cases. You can add your own test cases as well.\n",
        "\n",
        "To adapt this test set, just create your own list with dictionary entries and keys 'query', 'answer' and 'context'."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "I8-bR172JT1K"
      },
      "outputs": [],
      "source": [
        "def hello_world_questions():\n",
        "\n",
        "    \"\"\"\n",
        "    Representative test set - we would recommend running this script as a 'hello world' test on the first\n",
        "    try that you use a model, and then adapt the content to your own set of context and questions.\n",
        "\n",
        "    There is nothing special about these questions, and in fact, you will note that many of the models will get\n",
        "    a couple of answers wrong (especially the ~1B parameter models).   The errors are important insights\n",
        "    as you evaluate which models to consider for your use case.\n",
        "    \"\"\"\n",
        "\n",
        "    test_list = [\n",
        "\n",
        "    {\"query\": \"What is the total amount of the invoice?\",\n",
        "     \"answer\": \"$22,500.00\",\n",
        "     \"context\": \"Services Vendor Inc. \\n100 Elm Street Pleasantville, NY \\nTO Alpha Inc. 5900 1st Street \"\n",
        "                \"Los Angeles, CA \\nDescription Front End Engineering Service $5000.00 \\n Back End Engineering\"\n",
        "                \" Service $7500.00 \\n Quality Assurance Manager $10,000.00 \\n Total Amount $22,500.00 \\n\"\n",
        "                \"Make all checks payable to Services Vendor Inc. Payment is due within 30 days.\"\n",
        "                \"If you have any questions concerning this invoice, contact Bia Hermes. \"\n",
        "                \"THANK YOU FOR YOUR BUSINESS!  INVOICE INVOICE # 0001 DATE 01/01/2022 FOR Alpha Project P.O. # 1000\"},\n",
        "\n",
        "    {\"query\": \"What was the amount of the trade surplus?\",\n",
        "     \"answer\": \"62.4 billion yen ($416.6 million)\",\n",
        "     \"context\": \"Japan’s September trade balance swings into surplus, surprising expectations\"\n",
        "                \"Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, \"\n",
        "                \"beating expectations from economists polled by Reuters for a trade deficit of 42.5 \"\n",
        "                \"billion yen. Data from Japan’s customs agency revealed that exports in September \"\n",
        "                \"increased 4.3% year on year, while imports slid 16.3% compared to the same period \"\n",
        "                \"last year. According to FactSet, exports to Asia fell for the ninth straight month, \"\n",
        "                \"which reflected ongoing China weakness. Exports were supported by shipments to \"\n",
        "                \"Western markets, FactSet added. — Lim Hui Jie\"},\n",
        "\n",
        "    {\"query\": \"What was Microsoft's revenue in the 3rd quarter?\",\n",
        "     \"answer\": \"$52.9 billion\",\n",
        "     \"context\": \"Microsoft Cloud Strength Drives Third Quarter Results \\nREDMOND, Wash. — April 25, 2023 — \"\n",
        "                \"Microsoft Corp. today announced the following results for the quarter ended March 31, 2023,\"\n",
        "                \" as compared to the corresponding period of last fiscal year:\\n· Revenue was $52.9 billion\"\n",
        "                \" and increased 7% (up 10% in constant currency)\\n· Operating income was $22.4 billion \"\n",
        "                \"and increased 10% (up 15% in constant currency)\\n· Net income was $18.3 billion and \"\n",
        "                \"increased 9% (up 14% in constant currency)\\n· Diluted earnings per share was $2.45 \"\n",
        "                \"and increased 10% (up 14% in constant currency).\\n\"},\n",
        "\n",
        "    {\"query\": \"When did the LISP machine market collapse?\",\n",
        "     \"answer\": \"1987.\",\n",
        "     \"context\": \"The attendees became the leaders of AI research in the 1960s.\"\n",
        "                \"  They and their students produced programs that the press described as 'astonishing': \"\n",
        "                \"computers were learning checkers strategies, solving word problems in algebra, \"\n",
        "                \"proving logical theorems and speaking English.  By the middle of the 1960s, research in \"\n",
        "                \"the U.S. was heavily funded by the Department of Defense and laboratories had been \"\n",
        "                \"established around the world. Herbert Simon predicted, 'machines will be capable, \"\n",
        "                \"within twenty years, of doing any work a man can do'.  Marvin Minsky agreed, writing, \"\n",
        "                \"'within a generation ... the problem of creating 'artificial intelligence' will \"\n",
        "                \"substantially be solved'. They had, however, underestimated the difficulty of the problem.  \"\n",
        "                \"Both the U.S. and British governments cut off exploratory research in response \"\n",
        "                \"to the criticism of Sir James Lighthill and ongoing pressure from the US Congress \"\n",
        "                \"to fund more productive projects. Minsky's and Papert's book Perceptrons was understood \"\n",
        "                \"as proving that artificial neural networks approach would never be useful for solving \"\n",
        "                \"real-world tasks, thus discrediting the approach altogether.  The 'AI winter', a period \"\n",
        "                \"when obtaining funding for AI projects was difficult, followed.  In the early 1980s, \"\n",
        "                \"AI research was revived by the commercial success of expert systems, a form of AI \"\n",
        "                \"program that simulated the knowledge and analytical skills of human experts. By 1985, \"\n",
        "                \"the market for AI had reached over a billion dollars. At the same time, Japan's fifth \"\n",
        "                \"generation computer project inspired the U.S. and British governments to restore funding \"\n",
        "                \"for academic research. However, beginning with the collapse of the Lisp Machine market \"\n",
        "                \"in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.\"},\n",
        "\n",
        "    {\"query\": \"When will employment start?\",\n",
        "     \"answer\": \"April 16, 2012.\",\n",
        "     \"context\": \"THIS EXECUTIVE EMPLOYMENT AGREEMENT (this “Agreement”) is entered \"\n",
        "                \"into this 2nd day of April, 2012, by and between Aphrodite Apollo \"\n",
        "                \"(“Executive”) and TestCo Software, Inc. (the “Company” or “Employer”), \"\n",
        "                \"and shall become effective upon Executive’s commencement of employment \"\n",
        "                \"(the “Effective Date”) which is expected to commence on April 16, 2012. \"\n",
        "                \"The Company and Executive agree that unless Executive has commenced \"\n",
        "                \"employment with the Company as of April 16, 2012 (or such later date as \"\n",
        "                \"agreed by each of the Company and Executive) this Agreement shall be \"\n",
        "                \"null and void and of no further effect.\"}\n",
        "    ]\n",
        "\n",
        "    return test_list"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Rczsud2kOhqp"
      },
      "source": [
        "Now, let's try to run these test cases with different models of Dragon and Bling series.\n",
        "\n",
        "We will be printing:\n",
        "- LLM Response\n",
        "- Gold Answer\n",
        "- LLM Usage"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UpcI5QIjOzSQ"
      },
      "outputs": [],
      "source": [
        "def llmware_bling_dragon_hello_world (model_name):\n",
        "\n",
        "    \"\"\" Simple inference loop that loads a model and runs through a series of test questions. \"\"\"\n",
        "\n",
        "    t0 = time.time()\n",
        "    test_list = hello_world_questions()\n",
        "\n",
        "    print(f\"\\n > Loading Model: {model_name}...\")\n",
        "\n",
        "    #   please note that by default, we recommend setting temperature=0.0 and sample=False for fact-based RAG\n",
        "    prompter = Prompt().load_model(model_name, temperature=0.0, sample=False)\n",
        "\n",
        "    t1 = time.time()\n",
        "    print(f\"\\n > Model {model_name} load time: {t1-t0} seconds\")\n",
        "\n",
        "    for i, entries in enumerate(test_list):\n",
        "        print(f\"\\n{i+1}. Query: {entries['query']}\")\n",
        "\n",
        "        # run the prompt\n",
        "        output = prompter.prompt_main(entries[\"query\"],context=entries[\"context\"], prompt_name=\"default_with_context\")\n",
        "\n",
        "        llm_response = output[\"llm_response\"].strip(\"\\n\")\n",
        "        print(f\"LLM Response: {llm_response}\")\n",
        "        print(f\"Given Answer: {entries['answer']}\")\n",
        "        print(f\"LLM Usage: {output['usage']}\")\n",
        "\n",
        "    t2 = time.time()\n",
        "    print(f\"\\nTotal processing time: {t2-t1} seconds\")\n",
        "\n",
        "    return 0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KALiWrOtO7N9"
      },
      "source": [
        "Below is the main function with a series of models listed. Select the models one by one by just replacing the name of `my_model = bling_gguf[1]` to the one you want to use with the index i.e. the place where the model you want to test is present.\n",
        "\n",
        "For example, if I want to run dragon_gguf -> \"dragon-mistral-answer-tool\" model which is present at position 3 or index 2 (indexing starts from 0), therefore we will use `my_model = dragon_gguf[2]`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tuNXtu4GPDoV"
      },
      "outputs": [],
      "source": [
        "if __name__ == \"__main__\":\n",
        "\n",
        "    bling_pytorch = [\n",
        "\n",
        "        #   pytorch models - will run fast on GPU, and smaller ones good for CPU only\n",
        "        #   note: you will need to install pytorch and transformers to pull and access these models\n",
        "\n",
        "        \"llmware/bling-1b-0.1\",\n",
        "        \"llmware/bling-tiny-llama-v0\",\n",
        "        \"llmware/bling-1.4b-0.1\",\n",
        "        \"llmware/bling-falcon-1b-0.1\",\n",
        "        \"llmware/bling-cerebras-1.3b-0.1\",\n",
        "        \"llmware/bling-sheared-llama-1.3b-0.1\",\n",
        "        \"llmware/bling-sheared-llama-2.7b-0.1\",\n",
        "        \"llmware/bling-red-pajamas-3b-0.1\",\n",
        "        \"llmware/bling-stable-lm-3b-4e1t-v0\",\n",
        "        \"llmware/bling-phi-3\",\n",
        "        \"llmware/bling-phi-3.5\"\n",
        "    ]\n",
        "\n",
        "    dragon_pytorch = [\n",
        "\n",
        "        #   pytorch models - intended for GPU server use - will require pytorch, transformers, and in some cases,\n",
        "        #   other dependencies (einops, flash_attn).\n",
        "\n",
        "        \"llmware/dragon-mistral-7b-v0\",\n",
        "        \"llmware/dragon-yi-6b-v0\",\n",
        "        \"llmware/dragon-qwen-7b\",\n",
        "        \"llmware/dragon-llama-7b-v0\",\n",
        "        \"llmware/dragon-mistral-0.3\",\n",
        "        \"llmware/dragon-llama-3.1\",\n",
        "        \"llmware/dragon-deci-7b-v0\"\n",
        "    ]\n",
        "\n",
        "    bling_gguf = [\n",
        "\n",
        "        #   smaller cpu-oriented models - optimal for running on a CPU\n",
        "\n",
        "        \"bling-phi-3.5-gguf\",       # **NEW** - phi-3.5 (3.8b)\n",
        "        \"bling-answer-tool\",        # this is quantized bling-tiny-llama (1.1b)\n",
        "        \"bling-qwen-0.5b-gguf\",     # **NEW** - qwen2 (0.5b)\n",
        "        \"bling-qwen-1.5b-gguf\",     # **NEW** - qwen2 (1.5b)\n",
        "        \"bling-stablelm-3b-tool\",   # quantized bling-stablelm-3b (2.7b)\n",
        "        \"bling-phi-3-gguf\",         # quantized phi-3 (3.8b)\n",
        "        \"bling-phi-2-gguf\",         # quantized phi-2 (2.7b)\n",
        "        ]\n",
        "\n",
        "    dragon_gguf = [\n",
        "\n",
        "        #   larger models - 6b - 9b\n",
        "\n",
        "        \"dragon-yi-answer-tool\",     # quantized yi-6b (v1) (6b)\n",
        "        \"dragon-llama-answer-tool\",\n",
        "        \"dragon-mistral-answer-tool\",\n",
        "        \"dragon-qwen-7b-gguf\",          # **NEW** qwen2-7b (7b)\n",
        "        \"dragon-yi-9b-gguf\",            # **NEW** yi-9b (8.8b)\n",
        "        \"dragon-llama-3.1-gguf\",\n",
        "        \"dragon-mistral-0.3-gguf\"\n",
        "\n",
        "    ]\n",
        "\n",
        "    #   for most use cases, we would recommend using the GGUF for faster inference\n",
        "    #   NEW - if you are running on a Windows machine, then try substituting for one of the following:\n",
        "    #    -- \"bling-tiny-llama-ov\" -> uses OpenVino model version - requires `pip install openvino` and `pip install openvino_genai`\n",
        "    #    -- \"bling-tiny-llama-onnx\" -> uses ONNX model version - requires `pip install onnxruntime_genai`\n",
        "\n",
        "    my_model = bling_gguf[1]\n",
        "\n",
        "    llmware_bling_dragon_hello_world(my_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7EM24n7gMxWw"
      },
      "source": [
        "# Final Notes 🌪️\n",
        "\n",
        "We are always updating the BLING and DRAGON model collection with new models, including improvements in the training techniques and experimenting with new base models, and try to keep this script updated as a good 'hello world' sandbox entry point for testing and evaluation.\n",
        "\n",
        "We have trained these models on a wide range of base foundation models to also support preferences among specific users and clients for a particular base model.\n",
        "\n",
        "We score each of these models on a RAG benchmark test for accuracy and a number of specialized metrics. Please see the example \"`get_model_benchmarks`\" for a view of this.\n",
        "\n",
        "Loved it?? This is just an example of our models. Please check out our other Agentic AI examples with every model in detail here: https://github.com/llmware-ai/llmware/blob/main/fast_start/agents/agents-2-llmware_model_sampler_bling_dragon.py\n",
        "\n",
        "Also, if you have more interest in RAG, then please go with our RAG examples, which you can find here: https://github.com/llmware-ai/llmware/tree/main/fast_start/rag\n",
        "\n",
        "If you liked it, then please **star our repo** https://github.com/llmware-ai/llmware ⭐\n",
        "\n",
        "Any doubts?? Join our **discord server: https://discord.gg/GN49aWx2H3** 🫂"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
